法国专利FR3053509A1 METHOD FOR OCCULATING AN OBJECT IN AN IMAGE OR A VIDEO AND ASSOCIATED AUGMENTED REALITY METHOD

专利PDF首页>>法国专利

专利附录

专利说明

权利要求

类似技术

同族专利

引用文献

法律状态

优先权

专利摘要:
The invention relates to a method for generating a final image from an initial image comprising an object that can be worn by an individual. Said method comprises the following steps: a) detecting the presence of said object in the initial image; b) superimposing a first layer on the initial image, the first layer comprising a mask at least partially covering the object on the initial image; c) changing the appearance of at least a part of the mask. The invention makes it possible to conceal all or part of an object in an image or a video. The invention also relates to an augmented reality method for use by an individual wearing a vision device on the face, and a device for fitting a virtual object.
公开号:FR3053509A1
申请号:FR1656154
申请日:2016-06-30
公开日:2018-01-05
发明作者:Ariel Choukroun；Jerome Guenard
申请人:FITTINGBOX；
IPC主号:

专利说明:

TECHNICAL FIELD OF THE INVENTION
The field of the invention is that of image processing and image synthesis.
More specifically, the invention relates to a method for obscuring an object in an image or a video.
The invention notably finds applications in the field of augmented reality allowing the fitting of a pair of virtual glasses by an individual wearing when trying on a pair of real glasses.
STATE OF THE ART
For an individual wearing a pair of corrective glasses, it is difficult to try on a new pair of glasses before purchasing them. Indeed, the new frames comprising at the time of trying on the lenses without any optical correction, the user can only see himself with the precision of his visual impairment. Thus, for example for an average myopic user, the user must approach within twenty centimeters of the mirror to be able to observe himself. He cannot therefore judge for himself if the new frame suits him. This is all the more complex in the case of trying on a pair of sunglasses, where the tinted lenses greatly reduce the brightness, further reducing the visibility of the user.
Techniques for removing a pair of glasses from an image are known in the prior art, in particular in the context of facial recognition of people.
These techniques are based on the recognition of the face, in particular of characteristic points making it possible to detect the position of the eyes. This detection, coupled with learning the differences between the faces wearing a pair of glasses and those not wearing them, makes it possible to reconstruct an image of an individual without a pair of glasses from an image of the individual wearing a pair. glasses.
The major drawback of this technique is that it statistically reconstructs the face from images of individuals from an identical angle of view, generally from the front. This technique works only in two dimensions, only considers the inside of the 2D envelope of the face on the image. In other words, all the elements of the pair of glasses superimposed with a background of the face are not considered by this technique, which is detrimental on images presenting pairs of glasses wider than the face or when the face is not front on the image.
Another major drawback of this technique is that it only takes into account pairs of glasses having in particular a very thin frame, thus excluding all pairs of glasses having a thick frame.
Techniques are also known in the prior art allowing an individual to see himself virtually on a screen by means of an avatar, while trying on a new pair of glasses.
These techniques are based on the prior acquisition of images of the individual not wearing a pair of glasses. These images create a virtual model of the head of the individual on which the model of the new pair of glasses is added.
The major drawback of this technique is that it does not provide a realistic technique where the wearer of glasses can see his image on the screen, as in a mirror, trying on a new pair of glasses.
Finally, there are augmented reality systems that allow you to try on a pair of virtual glasses.
At present, none of the existing augmented reality systems make it possible to virtually remove a real object, such as for example a pair of glasses, from one or more individuals carrying this object.
OBJECTIVES OF THE INVENTION
The present invention aims to remedy all or part of the drawbacks of the state of the art cited above.
One of the main objectives of the invention is to propose a technique which allows a user wearing a real vision device to see himself on a screen, as in a mirror, without the real vision device on the face and to try a virtual object replacing on the screen the real vision device kept on the face.
Another objective of the invention is to propose a technique which is realistic for the user.
Another objective of the invention is to propose a technique which works in real time.
Another object of the invention is to provide a technique which allows the user trying to move a virtual object to move their head in any direction.
An objective of the invention is also to propose a technique for removing the visible part of an object, in particular a pair of glasses, on an image or a video, but also local light interactions such as lens reflections or shadows. litters.
STATEMENT OF THE INVENTION
These objectives, as well as others which will appear subsequently, are achieved using a process for generating a final image from an initial image comprising an object capable of being carried by an individual.
The object may for example be a vision device worn on the face, such as a pair of glasses or a portable device on the head comprising a frame and a display screen, such as a virtual reality, mixed reality or augmented reality. The object can also be any other accessory worn on the head of an individual such as a scarf, a hat, makeup, a jewel or a hairstyle.
The image is acquired by an image acquisition device which can be a camera, a photographic camera or a depth camera. The depth camera, well known to those skilled in the art, combines a camera and an infrared measurement of the distance of the elements from the objective. The image can be alone or included in a sequence of images, also called video.
According to the invention, the method for generating an image comprises the following steps:
a) detection of the presence of said object in the initial image;
b) superimposition of a first layer on the initial image, the first layer comprising a mask covering at least partially the object on the initial image;
c) modification of the appearance of at least part of the mask.
Thus, this process allows you to modify the visual appearance of the detected object by covering yourself with a mask whose appearance is modified. The mask includes pixels covering a continuous area or not on the initial image. The mask can cover all or only part of the object. In the example of a pair of glasses, the mask may cover only the frame of the pair of glasses, the frame and part of the lenses, the frame and the lenses in whole, or only the lenses. IF it should be noted that the shadows worn by the glasses can also be covered by the mask.
The change in the appearance of the mask corresponds to a change in the color and / or the opacity of some or all of the pixels of the mask.
In a particular embodiment of the invention, the modification of the appearance of the mask comprises a step of replacing the texture of part or all of the object on the final image.
Thus, it is possible for a user to wear a pair of glasses of a certain color and to see himself with the same pair of glasses with another color. The texture of the object is a representation of the external appearance of the object. The texture can for example be linked to the color of the object, to its constitution, such as the presence of different layers of porous or translucent materials. The texture can also be linked to the type of coating on the object, such as its presence of a layer of glossy or matt varnish.
In a particular embodiment of the invention, the modification of the appearance of the mask includes a step of determining the texture of part or all of the object, the texture reproducing the elements behind plane of the object in order to obscure all or part of the object on the final image.
Thus, the object detected in the initial image is automatically obscured from the final image. In other words, the method of generating a final image from an initial image is a method of obscuring an object in an image.
In a particular embodiment of the invention, the mask also covers all or part of the shadow cast by the object.
Thus, the modification of the appearance of the mask also makes its drop shadows invisible to the object. For example, the shadow of a pair of glasses worn on the face of an individual can also be erased on the face, thereby increasing the realism of the concealment of the pair of glasses.
In a particular embodiment of the invention, the method for generating an image also includes the following step:
d) superimposition of a second layer on the initial image above the first layer, the second layer comprising at least one element partially covering the mask.
Thus, the elements included in the second layer are for example hair covering a branch of a pair of glasses, a hand located partially in front of the object. The superimposition of the different layers keeps the realism of the final image.
In a particular embodiment of the invention, the method for generating an image also comprises, before step b), the following steps:
determination of the orientation of the object with respect to a device for acquiring the initial image;
determination of a characteristic dimension of the object on the initial image.
The initial image acquisition device includes a photographic sensor and a photographic lens for converging real images on the sensitive surface of the photographic sensor. The photographic objective includes at least one converging lens. The image acquisition device can be for example a camera, a photographic camera or a webcam.
The orientation of the object relative to the image acquisition device corresponds to the angles formed by the object in a frame of the acquisition device. This coordinate system may for example be an orthonormal coordinate system whose axis coincides with the optical axis of the objective. In other words, the object whose orientation is determined, is followed during a sequence of images.
The characteristic dimension of the object can be for example the width of the frame of a pair of glasses.
In a particular embodiment of the invention, the method for generating an image also comprises, before step b), the following steps:
development of a model of the object;
elaboration of the mask from the geometric projection of the three-dimensional model on the first layer, the three-dimensional model having the same orientation and the same characteristic dimension on the first layer as the object.
Thus, the model representing the object is virtually superimposed on the object. It should be noted that the object model can include distorted and flattened two-dimensional images depending on the orientation and size of the real object. The object model can also be three-dimensional with or without thickness. The orientation and the characteristic dimension of the model correspond to parameters of similarity between the model of the object and the real object. The projection of the three-dimensional model makes it possible to obtain the mask. The mask can cover all or part of the result of the projection of the model on the layer. Note that the mask can also cover a larger area of the image than the projection.
In a particular embodiment of the invention, the development of the model of the object is carried out from at least one image of the object alone.
The generation of the object model can for example be carried out in a device dedicated to modeling, comprising a box in which the object is housed, and one or more image acquisition devices oriented towards the object. An image may be sufficient for the development of the model of the object, provided that it is a three-quarter view of an object having a plane of symmetry, such as for example a pair of glasses. More generally, the development of a model of the object is carried out from at least two images of the object, the images presenting the object from different angles.
In one embodiment of the invention, the object is worn on the face of an individual.
In a particular embodiment of the invention, the development of the model of the object is carried out from at least one image of the object worn on the face of the individual.
Thus, the individual can keep the object on his face during the generation of the model.
In a particular embodiment of the invention, the object comprises a frame extending on either side of the face, and at least one lens assembled to said frame.
Thus, the object can be a vision device such as for example a pair of glasses.
In a particular embodiment of the invention, the method for generating an image also includes a step of identifying the frame from among the frames previously modeled and stored in a database, the mask being produced from the model. of the identified frame.
Thus, the projection on the first layer of the model of the identified frame and previously modeled makes it possible to obtain a realistic mask of the frame. It should be noted that the mask can include all or part of the projection of the frame on the first layer. An image area corresponding to a lens assembled to the frame, or to a drop shadow, can be added to the mask.
The identification can be carried out automatically by the process or manually by an individual. Manual identification can be carried out, for example, using the information entered by the manufacturer inside the frame of the pair of glasses.
In a particular embodiment of the invention, the identification of the frame is carried out by generating support curves which adjust to the contours of the frame.
In a particular embodiment of the invention, the identification of the frame is based on at least one of the following criteria:
frame shape; frame color (s); frame texture (s); logo presented by the frame.
In a particular embodiment of the invention, the method of generating an image also includes a step of drawing up a representation of the environment of the object.
The environment includes all the elements surrounding the object on the image, as well as the background elements of the object on the image. The representation can be in the form of an image and / or a three-dimensional model. For example, in the case of a pair of glasses worn on a face, the representation of the environment may include a model of the face on which the pair of glasses is worn and / or an image corresponding to the background of the face.
In a particular embodiment of the invention, the step of modifying the appearance of the mask comprises the following substeps:
geometric projection of the representation of the environment on an intermediate layer superimposed on the first layer; determination of the new color of a pixel of the mask as a function of the color of at least one pixel of the intermediate layer near the pixel of the mask.
Thus, the modification of the appearance of the mask makes it possible to obscure the object on the final image. The geometrical projection of the representation of the environment on the intermediate layer makes it possible to obtain an image on which the mask of the object is superimposed. In the case of a representation comprising a background image and a three-dimensional model, the geometric projection of the three-dimensional model on the intermediate layer produces an image superimposed on the background image. The intermediate layer thus presents a two-dimensional representation of the environment on which the mask of the object is superimposed.
In a particular embodiment of the invention, the method for generating an image also comprises a step of detecting the presence of a face in the environment and in that the representation of the environment comprises a model of the face detected on which a texture of the face is applied.
The texture of the face is a two-dimensional image applied to the model. It should be noted that the pattern and texture can be advantageously realistic. The detection of the presence of the face can be carried out by detecting characteristic points of the face, such as for example the edge of the temples, the end of the nose or the chin, even the corners of the eyes.
In a particular embodiment of the invention, the method for generating an image also comprises a step of determining the orientation of the face relative to the acquisition device and in that the model of the face is arranged substantially according to the orientation previously determined.
Thus, the three-dimensional model representing the face is realistically oriented in the virtual space corresponding to the scene acquired by the image.
In a particular embodiment of the invention, the mask covering at least partially the object worn on the face is developed from the geometric projection of the face model on the first layer.
Thus, the concealment of the object worn on the face is carried out using a mask developed from a projection of the face model and not from a projection of the object model. It should be emphasized that this embodiment makes it possible to dispense with the tracking of the object. In addition, the mask developed may not take into account the size of the object, in which case the size of the mask is established according to the size of the face. In the case of a pair of glasses worn on the face, the size of the mask is advantageously sufficient to cover most of the models of existing pairs of glasses.
In a particular embodiment of the invention, the method for generating an image also comprises the following steps:
analysis of at least one light source illuminating the face of the individual;
colorimetric transformation of all or part of the face model.
Thus, the model of the face is realistically illuminated compared to the real scene.
In a particular embodiment of the invention, the color of a pixel on the texture of the face is determined by means of an inpainting method from the colors of a patch near the pixel.
The patch corresponds to a plurality of pixels forming a continuous area. The shape of the patch can be square or rectangular, each side generally comprising between one and five pixels. A circular patch can be obtained by inserting a Gaussian filter inside a square patch. The method of inpainting, well known to those skilled in the art, makes it possible to complete the texture of the face, in particular in the case of the generation of the model of the face of an individual wearing a pair of glasses. Indeed, in this example, the frame or even the lenses mask part of the face.
In a particular embodiment of the invention, the position of the patch is located substantially on the perpendicular with respect to the contour of the area comprising the missing pixels.
Thus, when a part of the face is obscured, the color of a pixel missing from the texture of the face is restored from a patch near the missing pixel, the patch being located on a perpendicular to the outline of the obscured area. of the face.
In a particular embodiment of the invention, the position of the patch is located substantially on the vertical with respect to said pixel.
Thus, the method of inpainting respects the general typology of a face which includes on both sides a vertical area of hair covering a part of the temples.
In a particular embodiment of the invention, the color of a pixel on the texture of the face is determined by means of an inpainting method from the face model, previously established and oriented, the model of the face including a representation of the eyes.
In a particular embodiment of the invention, the method for generating an image also comprises a step of identifying at least one eye area on the texture of the face, the eye area corresponding to the position of one eye. of the face detected.
The identification of an ocular zone on the texture of the face can be carried out by identifying the position of characteristic points of an eye such as for example the precise external and internal corners of an eye.
In a particular embodiment of the invention, the filling of the ocular zone is carried out by knowing the topology of the eye of the face detected.
The topology of the eye includes a parametric representation by means of curves, of the various areas of the eye, in particular of the iris and the eyelids.
Thus, filling the eye area is more realistic because it respects the position of the iris and the pupil. The iris filling can be done by an inpainting method recovering a nearby pixel in an area corresponding to the iris. In the case where the area corresponding to the iris covers empty pixels or those with no coherent values, the iris is restored according to a standard iris topology, possibly taking into account the color of the iris of the other eye detected.
In a particular embodiment of the invention, the development of the representation of the environment of the object worn on the face of an individual is carried out without detecting a face in the environment.
Thus, the method is used without detecting or tracking the face of an individual.
In a particular embodiment of the invention, the development of the representation of the environment comprises a sub-step for correcting the optical deformation due to a transparent element placed between the environment and a device for acquiring the 'initial image.
Thus, the geometric deformations in the image of the face or of the background caused for example by the refraction of a corrective lens of a pair of glasses placed on the face of an individual are corrected.
In a particular embodiment of the invention, the image generation method is applied to all or part of a sequence of images forming a video.
It should be noted that the video can be in the form of a recording or a stream in real time, such as for example a video broadcast in streaming, a technique well known in itself. Video can also be a real-time stream from a camera and instantly visible on a screen
In a particular embodiment of the invention, the representation of the environment and / or the model of the object are updated with each image of the sequence.
Thus, the representation and / or the model being updated from several images of the sequence are more and more representative of reality. An area masked by the object, such as the part of the face behind a pair of glasses, can thus be updated in the representation of the environment including a model of the face, when the individual turns his head. By turning the head, the device for acquiring the initial image takes images of the face from new viewing angles, which improves knowledge of the face.
In a particular embodiment of the invention, the representation of the environment and / or the model of the object is updated from a plurality of initial images taken according to a plurality of distinct viewing angles. .
The initial images taken from a plurality of distinct angles of view can come from one or more image acquisition devices oriented at different angles.
In a particular embodiment of the invention, the generation of the final image is carried out in real time from the initial image.
Thus, the processing of the acquired image is carried out in a short and guaranteed time. The processing time of an image acquired from the individual makes it possible in particular to display the image of the treated individual without any visible lag for the individual. The processing time is less than 1 / 10® of a second. The processing time is preferably but not necessarily less than the display time between two images, which is generally equal to 1 / 25® of a second. In other words, the real-time processing makes it possible to display a video stream coming from a camera instantly on a screen, the images of this stream having undergone processing in a sufficiently short time so as not to be perceived by the human eye.
The invention also relates to an augmented reality method intended for use by an individual wearing a vision device on the face, comprising the following steps:
real-time acquisition of a video of the individual wearing the vision device on the face;
real-time display of video in which the appearance of the viewing device is totally or partially changed by the image generation process.
Thus, thanks to the image generation process, the individual can see himself live on a screen without the vision device worn on his face. This augmented reality process notably allows an individual who has to wear a pair of corrective glasses to see themselves on the screen with the same pair of glasses but with a different color and / or texture.
Advantageously, the vision device is totally or partly obscured from the video displayed in real time.
Thus, the individual having to wear a pair of corrective glasses sees himself on the screen without his pair of glasses while actually wearing it on the nose.
In one embodiment of the invention, the individual wearing the vision device tries a virtual object overlapping at least partially in the video on the vision device partially or completely obscured.
Thus, a user wearing a pair of corrective glasses can virtually try a new pair of glasses while keeping his pair of corrective glasses allowing him to keep his visual comfort.
In a particular embodiment of the invention, the augmented reality method comprises a step of initializing the model of the face of the individual from at least one image of the individual not wearing the vision device on the face.
Thus, the individual first removes his vision device from his face for the generation of the model of the face, and puts it back after a given time. The image acquisition of the individual can be carried out with one or more image acquisition devices. The individual can make movements of his head so that the generation of the facial model is carried out from a plurality of facial images acquired from different angles of view.
In a particular embodiment of the invention, the augmented reality method comprises a step of initializing the model of the face of the individual from a plurality of images of the individual wearing the vision device, the images corresponding to different angles of view of the face.
Thus, the generation of the face model is carried out without the user having to remove his facial vision device.
In a particular embodiment of the invention, the augmented reality method comprises a step of initializing the model of the vision device from at least one image of said device acquired in a dedicated modeling device.
In another particular embodiment of the invention, the augmented reality method comprises a step of initializing the vision device model from at least one image of the individual wearing the vision device.
The invention also relates to an augmented reality device allowing the fitting of a virtual object by an individual wearing a vision device, the virtual object covering at least partially the vision device, the fitting device comprising:
at least one camera acquiring a video of the individual; an acquired video processing unit, the processing unit at least partially obscuring on the majority or all of the images of the video the vision device by means of an image generation process;
at least one screen displaying the processed video of the individual.
In a particular embodiment of the invention, the screen is vertical and the camera is fixed substantially in the plane of the screen.
Thus, this particular configuration allows an individual sitting or standing facing the camera to see himself live on the screen, as in a mirror.
In a particular embodiment of the invention, the device for fitting a virtual object comprises two cameras spaced apart, parallel to an edge of the screen, from a distance of between thirty and fifty centimeters.
Thus, the individual being generally placed at a distance of between eighty centimeters and one meter from the screen in order to be able to touch the screen, the distance between the cameras is optimal to obtain two shots of the face making it possible to reconstruct the model and texture of the face realistically.
In a particular embodiment of the invention, the device for fitting a virtual object further comprises a third camera substantially on the median axis between the first two cameras.
Thus, the third camera makes it possible to obtain a front image of the individual, this image being displayed on the screen. The first two cameras improve the realistic modeling of the face and the pair of glasses worn by the user.
In a particular embodiment of the invention, the screen is tactile.
Thus, the user can select the virtual object to try. The virtual object can be a vision device such as a pair of eyeglasses or sunglasses, a facial accessory or even make-up.
In a particular embodiment of the invention, the display of the acquired and modified video is carried out in real time.
In other words, the device is an augmented reality device where the user can try out a virtual object and see themselves in real time on the screen.
In a particular embodiment of the invention, the device for fitting a virtual object comprises a device for acquiring the three-dimensional model of the vision device.
BRIEF DESCRIPTION OF THE FIGURES
Other advantages, aims and particular characteristics of the present invention will emerge from the following non-limiting description of at least one particular embodiment of the methods and devices which are the subject of the present invention, with reference to the appended drawings, in which:
- Figure 1 shows an embodiment of an augmented reality device allowing the fitting of a virtual object by an individual wearing a real vision device;
- Figure 2 shows a block diagram of an embodiment of a method for generating a final image from an initial image;
- Figure 3 shows in the form of a block diagram the steps of the image generation process with reference to Figure 2;
- Figure 4 illustrates the model of the eyes in the form of five views:
o 4a: a perspective view of the eye model; o 4b: a side view of an eyeball model;
o 4c: a front view of an eye model;
o 4d: a side view of an eyeball model including the eyelid curves; o 4th: a front view of an eye model illustrating the displacement of the iris;
- Figure 5 illustrates an example of texture acquired from a real face;
- Figure 6 illustrates the masks generated during the image generation process with reference to Figure 2;
- Figure 7 shows another embodiment of an augmented reality device allowing the fitting of a virtual object by an individual wearing a real vision device;
- Figure 8 shows in the form of a block diagram another embodiment of a method for generating a final image from an initial image
- Figure 9 shows another embodiment of an augmented reality device used by an individual wearing a real vision device;
- Figure 10 shows a pair of glasses used in the exemplary embodiments of the invention;
- Figure 11 shows a screen displaying a video of an individual wearing a pair of glasses on the face;
- Figure 12 shows a screen displaying the video with reference to Figure 11 in which the pair of glasses is obscured.
DETAILED DESCRIPTION OF EMBODIMENTS OF THE INVENTION
This description is given without limitation, each characteristic of an embodiment can be combined with any other characteristic of any other embodiment in an advantageous manner.
We note, as of now, that the figures are not to scale.
Preliminary description of the exemplary embodiments of the invention
FIG. 10 represents a pair of glasses 111 comprising a rigid frame 112 and two corrective lenses 113 assembled to the frame 112. The frame 112 symmetrical with respect to the median plane AA comprises a face 112b and two branches 117 extending laterally on the side and on the other side of face 112b.
The face 112b comprises two circles 114 surrounding the glasses 113, a bridge 115 ensuring the spacing between the circles 114 as well as two studs
119. Two plates 116 each fixed to a circle 114 are intended to rest on either side of an individual's nose.
The two branches 117 are each fixed to a lug 119 of the face 112b by means of hinges 118 thus allowing the frame 112 to be articulated. In the open position of the frame 112, the face 112b is inclined at an angle of between 5 ° and 10 ° relative to the plane perpendicular to the plane formed by the axes of the branches 117. This angle generally merges with the pantoscopic angle of the pair of glasses 111, that is to say at the angle of the face 112b with the vertical when the pair of glasses 111 is placed on the nose of an individual looking into the distance without tilting the head, the plane of the branches being horizontal. Each branch 117 ends with a sleeve 117b intended to rest on an ear of an individual. The frame 112 thus extends laterally on either side of the face of an individual wearing the pair of glasses 111.
It should be emphasized that the pair of glasses 111 used in the following two examples of embodiment of the invention is a nonlimiting example of a real object erased from an image or from a sequence of images by the method subject of the invention.
Example of a particular embodiment of the invention
FIG. 1 represents a device 100 for trying on a virtual object 110 by an individual 120 wearing the pair of glasses 111 on the face.
It should be emphasized that in the present nonlimiting example of the invention, the individual 120 is moderately short-sighted. Thus, the visibility of the individual 120 not wearing a pair of corrective glasses is approximately twenty centimeters.
The device 100 comprises a touch screen 130 fixed vertically on a support 131, a camera 132 centered above the screen 130, two peripheral cameras 133 and a processing unit 134.
In a variant of this embodiment, the device 100 further comprises a depth sensor measuring by infrared the distance of the elements from the camera. The depth sensor may include an infrared projector and an infrared wavelength photosensitive sensor. The photosensitive sensor being in the immediate vicinity of the projector, the density of the points of the image makes it possible to deduce therefrom a depth map indicating the distance of each point of the image relative to the sensor.
In another variant of this particular embodiment of the invention, the device 100 also comprises a scanner or a double sensor making it possible to acquire a model of the entire face of the individual 120.
When the individual 120 stands facing the screen 130, the individual 120 sees the image of his face 121 from the front, acquired in real time by the camera 132. In order to be able to touch the touch screen 130, the individual 120 stands at a distance of about one arm from the screen 130. The distance between the individual 120 and the touch screen 130 is between sixty and one hundred and twenty centimeters. The individual 120 wears the pair of glasses 111 in order to see the screen 130 clearly.
The two peripheral cameras 133 are fixed on a parallel rail 135 at the upper edge of the screen 130, symmetrically on either side of the camera 132. The distance between the two peripheral cameras 133 is between thirty and fifty centimeters. In the present example, the two cameras 133 are spaced forty centimeters from each other, which makes it possible to obtain images of the face 121 of the individual 120 with an angle of view offset by approximately 20 ° by compared to normaë.
The processing unit 134 generates from each initial image of the sequence acquired by the camera 132 a final image of the face 121 of the individual 120 in which the pair of glasses 111 is obscured. In other words, the pair of glasses 111 is made invisible on the real-time display of the face 121 on the screen 130.
To this end, a virtual representation of the scene acquired by the camera 132 is created. This virtual representation comprises a three-dimensional model of the pair of glasses 111 positioned on a representation of the environment comprising a model of the face of the individual 120. The projection of the model of the pair of glasses 111 and of the representation of the environment creates a mask superimposed on the real pair of glasses on each image of the sequence acquired by the camera 132.
It should be noted that for virtual representation, a virtual camera replaces camera 132 with the same viewing angle and the same magnification. In other words, the optical characteristics of the virtual camera are identical to those of the camera 132.
As illustrated in FIG. 2, the processing unit 134 thus generates a new image 210 from each image 220 of the sequence 200 acquired by the camera 132 according to a method 300 for generating an image.
FIG. 3 represents in the form of a block diagram the generation process 300.
In a first step 310, the generation method 300 detects the presence of the pair of glasses 111 in the initial image 220.
The generation method 300 determines in a second step 320 the orientation of the pair of glasses 111 relative to the camera 132.
The generation method 300 determines in a step 330 a characteristic dimension of the pair of glasses 111 on the initial image 220. The characteristic dimension is in the present nonlimiting example of the invention, equal to the width of the frame 112.
The generation method 300 elaborates in a step 340 a three-dimensional model of the pair of glasses 111 in a virtual space representing the real space acquired by the camera 132.
Step 340 of developing the model of the pair of glasses 111 comprises a first sub-step 341 of identifying the pair of glasses 111 among the pairs of glasses previously modeled and stored in a database linked to the unit 134. This identification can be made by knowing the reference of the telescope and the so-called framemarking elements printed on the latter.
The identification of the pair of glasses 111 can also be carried out by automatic recognition from images of the pair of glasses worn by the user or in a device dedicated to the acquisition of images of the pair of glasses alone. , like for example a light box. To this end, the automatic identification uses methods of indexing and visual recognition of the appearance of 3D objects well known to those skilled in the art, for example by generating support curves which adjust to the contours of the pair of glasses 111.
It should be emphasized that the visual recognition of the pair of glasses can be carried out using the criteria of:
shape of the pair of glasses; color (s) of the pair of glasses; texture of the pair of glasses;
presence of a notable characteristic of the pair of glasses or of a logo.
In the case where the sub-step 341 leads to a positive result where the pair of glasses is identified, the model of the pair of glasses 111 is extracted from the database during a sub-step 342.
In the opposite case where no pair of glasses from the database corresponds to the pair of glasses 111, the 3D model of the pair of glasses 111 is developed, during a sub-step 343, from images of the sequence 200 acquired by the camera 132, and possibly from the parameters representing the closest model in shape determined during the search step in the database
It should be noted that the images in sequence 200 show the individual 120 wearing the pair of glasses 111 on his face. The model of the pair of glasses 111 is thus produced in real time from images acquired by the central camera 132 and by the peripheral cameras 133. When the head of the individual 120 tilts and / or turns, the cameras acquire images from a new angle of view. The model of the pair of glasses 111 is updated with each image, in particular when the image presents a view of the individual 120 from a different angle.
The pair of glasses 111 model developed during sub-step 343 is constructed by first creating a shape model of the face 112b of the pair of glasses 111 and a model of the branches 117 of the pair of glasses 111 It should be emphasized that in the case where the pair of glasses is not symmetrical, a model for each branch is created.
In a variant of this particular embodiment of the invention, the shape model of the face 112b also includes the plates 116.
In order to develop the model of the face 112b of the pair of glasses 111 and the model of the branches 117, a skeleton of the pair of glasses is used. The skeleton is extracted from a database of typical topologies of pairs of glasses. The typical topologies of pairs of glasses make it possible to classify the pairs of glasses according to the shapes of the glasses. The topologies are defined by:
a type of circle: whole circle, upper semicircle, lower semicircle, absence of a circle;
a circle shape: round, oval, rectangular;
a shape of studs;
a bridge or a bar connecting the two glasses, the branch and / or the bar being able to be simple or multiple; two branches;
knowledge of differentiated parts of each of the above elements, such as the presence of a hole in a branch, an asymmetry between the circles, a protrusion on the frame ...
The thickness is determined around the skeleton of the pair of glasses by generating a closed 3D envelope which includes the pair of glasses 111.
The generation of the 3D envelope is carried out in the following three sub-steps:
creation of support curves in the planes perpendicular to the skeleton. These support curves correspond substantially to the sections of the frame 112;
generation of a 3D envelope in contact with the support curves; creation of a mesh inside the 3D envelope.
It should be noted that the support curves used to generate the 3D envelope are derived from prior knowledge, drawn manually or learned statistically. The initialization of the support curves is generally carried out during the visual recognition step in an attempt to automatically identify the pair of glasses 111. The support curves are generated from the images of the pair of glasses 111 worn on the face. or from images of the pair of glasses 111 acquired on a neutral background by a dedicated modeling device (not shown in FIG. 1).
After the generation of the model of the pair of glasses 111 from the images of the individual 120 wearing the pair of glasses 111 on the face, the model of the pair of glasses 111 is then readjusted in an identical manner to the pair of glasses 111 real, during step 350. The model of the pair of glasses 111a thus has the same orientation with respect to the camera 132 and the same dimension characteristic to the image as the pair of glasses 111 real. In other words, the model of the pair of glasses 111 is positioned in the virtual space, oriented according to the position of the virtual camera and configured according to the dimension of the pair of glasses 111 real. A magnification factor can thus be applied to the model of the pair of glasses 111. The pose parameters of the model of the pair of glasses 111 are denoted P ^ Mg ·
During step 355, the generation method develops a three-dimensional geometric model M _a of an avatar representing the face without the pair of glasses 111. A texture TaNG of the face without the pair of glasses 111 is also created at during step 355. The geometric model M _a is configured in morphology and expressions according to the method for developing the model of the face described below.
The process for developing the avatar comprises a first step of detecting the face in the image and of facial analysis of the detected face. Face detection is performed in the present nonlimiting example of the invention by a Viola-Jones method, as explained in patent FR2955409.
A line alignment algorithm is then used to find specific facial features, in a second sub-step of the facial development process. For this purpose, a line detector well known to those skilled in the art is used and allows internal features of the face to be found very reliably.
The HPAAM line alignment algorithm, described in European patent application EP2678804, then makes it possible to precisely locate the projection of significant 3D lines on the image. Unlike existing techniques which give rise to a localization error in congested environments, HPAAM is particularly stable on features located on the contour of the face, such as points of the ears. Since the HPAAM algorithm is a technique calling for a learning phase, the use of predetermined points having a 3D correspondence has an impact on the success of the technique of global facial analysis, in particular with regard to the robustness and efficiency of the technique. Typically, this relationship is specified for a small number of points in 3D facial analysis techniques, such as the starting points of a 3DMM adjustment strategy, in which five points are defined manually.
It should be emphasized that this face detection step is robust to the occultations of the face formed by the pair of glasses 111.
The second step of the avatar creation process concerns the estimation of the parameters of the 0 _mode i face model including:
the extrinsic parameters Pe _Ma of the face model, that is to say the parameters of pose of the face, including the position and the orientation of the face;
the intrinsic Pi _Ma parameters of the face, that is to say the 3D morphology of the face; and possibly facial expression models, and extrinsic parameters (Tse translation) and eye configuration, which will be re-estimated for each image during tracking.
The parameters of the face model 0 _mode ; are estimated using a statistical geometric model of the morphology of the human face. For this purpose, a database of faces is used, such as the database described in the document by Blanz and Vetter published in 2003, entitled "Face Recognition Based on Fitting a 3D Morphable Model".
An estimation of the parameters of the 0 _mode face model; and parameters 0 _cam of the virtual camera is performed using the lines found at the line detection stage and dynamically adjusting the contours in the image.
In order to estimate the intrinsic and extrinsic parameters 0 _cam of the camera and the parameters of the face model 0 _mociei , minimization of both the distance between the facial features found in the image and the projection of defined 3D semantic points on the parametric face Proj (X _{s &} ), and the distance between the projection of the parametric face contours and the associated image edges is performed.
The Proy (X) function represents the projective transformation of a 3D scene, such as the face or the pair of glasses, towards a layer or image plane, by considering a pinhole type camera model, well known to man of the profession, which allows for a perspective division. Thus, the function Proy (X) makes it possible to pass from the 3D point X = (x, y, z) of the Euclidean space of the scene to the point (u, v) of the layer, taking into account the intrinsic camera parameters contained in the matrix K and the rigid transformation of the form RX + T, with R rotation matrix 3x3 and T translation 3x1. When necessary, this Proj projection (X, Κ, Ρ, Τ ') will be noted. It should be emphasized that the projection of parametric face contours corresponds to the projection of points of the face model whose normal is orthogonal to their direction of observation.
For this purpose, a sampling of the direction orthogonal to the contours of the current sample point is carried out and makes it possible to sample the contours for several reasons: numerical efficiency, stability and compatibility with other alignment techniques used in tracking objects 3D. For this purpose, for each iteration of the minimization, a calculation of C (0 _C am <0modez) which contains a subset of points of the face model Xj with the normal n ₇ - orthogonal to the axial projection and associated points of the contour of the image contj = ProjContour (Xj, nj) is performed, where ProjContour is a function projecting the point Xj and searching, along the normal projection, for the best contour among multiple hypotheses. These hypotheses are calculated locally since the edges are calculated along the normal direction and respect the amplitude of the signal, which leads to a detection of edges which is precise and invariant according to the scale for the whole of the object. face. This method will hereinafter be called the normals method.
The cost function can for example be implemented using an approach of the estimator type M such as that which uses the robust weight function of Tukey. Alternatively, a calculation of the residue for the nearest point along the normal direction of the contour from among multiple assumptions can be performed.
In the end, the equation to be resolved is written:
argmin (y Σ _{ί =} ι..η | / ρProj (X _sÇÎ) ) \ + (1 - y) ^cont yll ₂ ) ( ¹ ) û û v 4, J L. / ^u cam> ^u model where | | ₂ represents the Euclidean distance and y is a parameter allowing to give more importance to one of the two parts of the cost function: either the contours or the lines. This equation can be solved using conventional gradient descent techniques well known to those skilled in the art.
The main advantage of this estimation technique is that when multiple images are available, as here in the 200 image sequence, it extends to a multi-image analysis algorithm which eases the constraint of 2D semantic correspondence. / 3D and allows to refine all the estimated parameters. It can be used to find the best fit morphology for all images.
It should be noted that when a face scan is performed in 3D, producing in particular 3D data, for example using an infrared sensor or a depth camera of the RGB-D type (English acronym for "Red-Green-Blue"). -Depth ”), a 3D / 3D constraint is added. Thus, for each point of the face model X _i; we try to minimize the distance between the point of the face model Xi and the nearest 3D point in the scanned data X ^ ¹ . We can thus add in equation (1) of minimization the following term:
Σ llv ^ mll, i = l..n
The third step in the avatar creation process involves adding 3D facial expressions.
The expressions add a certain variability to the face model and their exclusion allows a more stable and precise estimation of the parameters of pose and morphology of the face.
One approach usually used to create parametric variations of a mesh is to use blend shapes, that is, a set of geometric models combined linearly to produce unique instances. A technique commonly used to calculate these forms of mixing consists in deducing them statistically as described in [A 3D Face Model for Pose and Illumination Invariant Face Récognition, Paysan et al., 2009].
The model has the following form:
# (a) = g _m + axV, where g (a) is a vector representing a new form and is written g (a) = (x ₁ , y ₁ , z ₁ , ..., x _n , y _n , z _n ) ^T , with (x ^ y ^ Zi), is the i ^th vertex, g _m is the mean 3D form, a is a vector which contains user-specific adaptation parameters and V is a matrix which contains the base of Statistical Forms. In general, the bases of Statistical Forms include only variations in identity, without taking account of variations in expression, so as to guarantee a good ability to separate the control parameters.
Nevertheless expressions are advantageously added to the model for the calculation in real time.
The 3D model is a wireframe model which can be deformed in accordance with g (a, p) = gm + aV + βΑ, where β is a vector which contains the animation parameters, and A is a matrix which contains the Units Animation. As indicated in [CANDIDE-3 - An Updated Parameterized Face, Ahlberg, technical report, 2001], the Animation Units matrix makes it possible to ensure that the points pursued in 3D represent variations in expression.
Not only is this separation of parameters more powerful than conventional modeling, but it also simplifies calculation in real time. Rather than having all of the 3D pose, face identity, and expression parameters change each frame during the chase process, the invariant identity parameters are provided from the facial analysis stage. Only the 3D pose parameters and a small number of expression variation parameters are estimated for each image.
The complete estimate of the parameters of the deformable shape and pose model is carried out on the basis of the resolution of:
min Proj (g (a, β); Κ, R, T) - p2 £) | | ₂ R, T, p where R is the 3D rotation matrix, T is the 3D translation, K is the matrix of intrinsic camera parameters, a is fixed during the facial analysis stage, p2D is the current position in the image of the point continued in 3D.
The face model comprises an eye model connected by a rigid translation Tse between the reference mark of the face model and the reference mark of the eye model.
As illustrated in FIG. 4, the two eyes 401 are represented by two centers of rotation 402, denoted PS where S e {R, L} corresponds either to the right side (S = R for English “Right”) or to the left side (S = L for English "Left"). The two centers of rotation 402 are connected to the coordinate system of the eye system SE, by a distance pdS, S e {/ , £}. Each eye 401 is oriented relative to the reference system of the eye system by angles rxe, ryeS, S e {/ , £}, respective rotations around the axes x and y. The centers of rotation 402 are at a distance dr from the center of a disc 403 of radius hdi representing the iris. The disc 403 is included in an element composed of three Bézier curves 410 of order 3 having the same start and end control points, pEo, pEi, as shown in FIG. 4c. The curves of the edges of the eyes can be represented in 3D on the mesh of the face. It is important that the points pEo, pEi are at the intersection of the curves of the edges of the eyes, and that the curve which moves is parameterized by a parameter dpELv allowing the eyelid curve 410 ₃ to evolve between the values of the curve high 410i and low curve 410 ₂ . This one-dimensional parameter can influence the 3D course of the curve 410 ₃ of the eyelid according to a curve defined in space.
It should be emphasized that the curves 410i and 410 ₂ are controlled by control points comprising the points pEuL and pEuR respectively, and the points pEdL and pEdR.
The 3D course of the curve 410 ₃ of the eyelid can be represented in the modes of deformation of the configurable morphology model, as a function of the displacement of the position of the eyelid dpELy (t) given by the rotation rxEL around the x axis of the particular coordinate system of the eye, where t between 0 and 1 makes it possible to parameterize the position of a point on the eyelid curve 410 ₃ .
It should be emphasized that the point of the eyelid curve 410 ₃ where t is equal to 0.5 corresponds to the midpoint. At this point, the position dpELyÇt = 0.5) moves substantially on the disc of radius dr. We distinguish the configurations of the left and right eye L and R of dpELy, which makes it possible to model the closing of a single eyelid, unlike the vertical rotation parameter of the rxe eye for which in the vast majority of cases movements of the left and right eye are the same.
The Tse relationship allows the iris discs of the eyes rotating around the PL and PR points to touch the curves of the eyelids.
From image 220 where the face is detected, and for a pose Pe _Ma0 and intrinsic face parameters Pi _Ma0 , the parameters TsE> rxEL, rxe, {pdS, ryeS}; S g {R, L}, are estimated when the model is placed on the face for each image. The parameters of positioning of the eye system SE in the face marker TsE ′ as well as the parameters of pupil distance pdR and pdL are considered to belong to the morphology of the user and no longer need to be re-estimated once 'they are stable. We can solve them in relation to the reprojection of the model in an image, or from a set of acquired images. The resolution of the pupil distance parameters is described for example by patent FR 2 971 873.
The resolution of all the parameters TsE> rxEL, rxe, {pdS, ryeS}; S {/ , £}, is based on the following elements, considering one or more acquired images:
in the case where we consider the difference between the projection of the model and the image: by a gradient descent method which minimizes the difference in the synthesized appearance of the face with the image via a method from Lucas-Kanade; in the case where we consider an alignment of the iris Q and eyelid curves on the contour image: by minimizing the distances between the contours. To solve this minimization of the distances between the contours, we consider the homologous points located on the normal to the contour. The curves being parametric, it is easy to sample them with:
• an angle parameter θ, θ g [0.2π [for the iris curve which is a circle;
• an evaluation parameter s, se [0,1] of the curve for the eyelid which is a Bézier curve of order 3.
The difference between the sampled points C _iei and C _Esi of the model projected at the pose Pe _Ma0 for the intrinsic parameters Ρί _Μα ο> and the contour image I _co of the face obtained with conventional operators of the Canny or Sobel type is measured.
It should be emphasized that the difference between the sampled points and the contour image can also be determined by a search according to the normal method described above.
It is also possible to resolve the pose parameters by generating distance maps of the projected contour model, and to project the contour points of the image in this map for resolution.
The two types of equation to be solved are:
a) equation corresponding to the image difference for the resolution of the eye system:
argmin | | / (/ ρ _Γ07 · _τ Μ _αΛΕ ; K, Pe _Ma0 , Pi _Ma0 , dr) - I _o 11 ² {T _SE rxe, ryES, pdS, rxEL, hdi}; S = {R, L} where K is the matrix of intrinsic parameters of the camera, KfprojT ^m cl, se) is the image generated by the projection of the avatar model and of the eye system with taking into account the occultations of the SE eye system by closing the eyelids or by auto-occultations due to the pose of the model. The generation of the image supposes a known texture. A resolution is therefore added during initialization on learning parameters which vary the texture parametrically, of the active model type of appearance. During the update, the specific texture of the scene will be used. It should be emphasized that the contour difference is advantageously used for initialization for reasons of performance and simplicity of the data.
b) difference contour equation for the resolution of the eye system argmin>11Pro; (C _w C _{£ s} ; / f, Pe _Ma0 -Ti _Ma0 , dr) - p (/ _co ; θ, s) | ² , {T _SE , rxe, ryES, pdS, rxEL, hdi}; S = {R, L} ¹ where the set of points of the image I _co is selected along the normal to the gradient at the curve considered Q or C _E projected, for the points of the curves associated with the values of the parameters Θ and s.
In a variant of this particular embodiment of the invention, the ProjContour function is also used for minimizing the eyes.
It should be noted that in order to make the system of equations robust on first initialization, the following default values are used:
hdi = 6.5mm, dr = 10.5mm for initialization, as well as the pdR = pdL.ryeR = ryeL constraint. the value of the rotation rxe ₀ = 0 for a statistical mean value of rotation of the face of 0 degrees over a significant learning set. This constrains the resolution of the eye system parameters.
These values are then re-estimated when the parameters are updated.
In the case where there are two calibrated images or the depth map, it is possible to easily find all the parameters. These equations can be coupled with the resolution of the extrinsic and intrinsic parameters of the face.
In the case where a depth map is available in addition to the image, the estimation of the extrinsic and intrinsic parameters of the face is improved. Indeed, these values are used to perfect the estimation of the parametric model. If the parametric model does not fully match because the setting does not explain the depth, the face model is adapted on the surface, solving the system described in equation (1) of 3D resolution of the face. We then have not an estimate of the facial parameters but a parametric metrological model of the user's face.
Once the alignment has been achieved between the face model, the eye model and the image 220, the textures of the face TaNG and of the background defined in more detail below, are updated during the step 356 in order to correspond to the reality of the scene acquired by the camera 132.
The TaNG 450 texture, illustrated in FIG. 5, is an atlas of the face, calculated according to the conventional mesh unfolding methods well known to those skilled in the art. Once the 3D face is projected into the image, the visible faces and oriented towards the camera, for example by z-buffer or culling methods, allow the T _aNG 450 texture image to be _filled .
The textures of the eyes are distributed over the texture of the face and are broken down into three parts: the texture of the iris TaNG_I 451, the texture of the white of the eye T _{aNG E} 452, the texture of the eyelid T _{aNG EL} . These three elements can be incomplete during the acquisition but can be completed in a simple way by interpolation for T _{aNG E} and T _{aNG EL} for the unknown zones to synthesize or by knowledge of the topology for the non visible parts, like the top iris if the eye is not wide open. The circular nature of the pupil and of the iris makes it possible to complete the texture according to a polar configuration.
A Tbg> background map, also called a background map, is created in step 357.
The ^ bg card corresponds to the background and to all that is considered neither belonging to the real pair of glasses worn by the user, nor to the face, or any other element modeled explicitly, such as a hair model or a hand coming overlay on face and pair of glasses
111. The ^ bg map is updated dynamically by following update rules such as those found in classical background subtraction techniques. We refer to predominant color models for each of the pixels, using probability distributions and possible modes for the colors. Several models can be used, such as Gaussian mixtures, or mode estimates by kernel methods on histograms. This model is coupled with a dynamic temporal and possibly spatial updating model.
For example, the dynamic update model can be done in the following way: as in [Active Attentional Sampling for Speed-up of Background Substraction, Chang et al., 2012], for each pixel, we take into account a property of temporality P _t , a property of spatiality P _s eliminating the isolated pixels and a property of frequency on the last images of the video P _f making it possible to eliminate the pixels changing class too often and which may be due to noise. The product of these three values gives a probability for each pixel to belong to the map and to be updated.
The background map ^ bg is initialized by all the pixels not considered as the projected face or the pair of glasses projected at this stage. The background map has the same dimensions as image 220.
For the sake of performance, and thanks to the additional analyzes of the monitoring and analysis models for statistically aberrant points, also called outliers points, the modification method is used. This method includes the steps during which:
For each new image I, a face segmentation map T _a is calculated from the projection of the face model M _a in the image. In the same way, the projection of the model of glasses ^M g makes it possible to obtain the map of segmentation of glasses T _g .
For each of these maps, a pixel belonging to the projection of the model has a value of 1 while the other pixels have a value of 0. Remaining in the simple case where there are no other models, each pixel p is treated as follows:
• If T _a (p) = 0 and T _g = 0, T _bg (p) = I (p), • Otherwise, the texture is not modified.
It is also possible to calculate a map which for each pixel of indicates the number of images acquired since the last update of the pixel, which makes it possible to evaluate whether a pixel has been modified recently or not. It is thus possible to evaluate whether the value of a pixel is relevant with respect to the pixels located nearby, as a function of the respective times of last update. This modification method therefore favors recently modified pixels.
A model of occultations by the elements coming to be superimposed on the face 121, such as for example a hand or a lock of hair, is developed during step 358.
The occultation map is represented by a dynamic texture ^T fg which is updated with each image of the sequence 200. We consider as occultations any disturbance in the appearance models of the pair of glasses 111 and of the face 121, having a spatial and temporal consistency, which is distinguished from the characterization of the lighting on the face 121, of the shadows cast by the pair of glasses 111 or self-supporting by the face (nose for example), or the caustics created by the pair of glasses 111 on the face 121. The most probable case is that of the hair or a hand.
The blackout map is associated with a geometric model M _fg which can be variable. This can be a plane that represents a layer in front of the 3D scene, or an estimated or available depth map.
The value of the shading map is determined by the difference in the appearance prediction with the real image, i.e. by the difference between the projection of the virtual models representing the face, the pair of glasses and the background. and the real picture. In other words, the blackout map includes all the elements that have not been modeled before.
In a variant of this particular embodiment of the invention, an inpainting technique is used in order to fill possible empty spaces in the blackout card, thus making it possible to improve the appearance of the blackout card. .
Likewise, for small elements, smaller than the size of a pixel in the image, such as for example a fine lock of hair present jointly with an element of the face or an element of the pair of glasses 111 on a pixel, the blackout map takes into account local degrees of opacity. This opacity modification is commonly used to solve digital matting problems. We call Ta _fg the gray level opacity channel of the shading map, and TB _fg its binarization for opaque pixels of value 1.
In the case of the use of the depth sensor, the detection of occultations is easier and methods well known to those skilled in the art can be applied. However, in the present case where the user wears glasses, the RGBD sensors based on infrared technologies obtain a very bad signal because the pairs of glasses are objects generally made up of complex materials and with strong dynamics, like metal, translucent plastic and glass. The diffraction and refraction effects of these materials prevent the system for creating the depth map from functioning properly. On the other hand, the spatial resolution of these sensors is not sufficient for very thin glasses. As a result, not only are the glasses not or very poorly identified by the system, they also corrupt or make inaccessible all of the face data located in the vicinity and behind. The use of the image and parametric model of the proposed pair of glasses makes it possible to overcome these structural problems of depth sensors.
The textures of the pair of glasses, of the face and / or of the background are completed and updated during step 359 of the image generation process.
During the process, the state of the cards representing the elements displayed in the image changes as a function of the knowledge of the elements. In the present example, the face of the individual 120 is partially masked by the pair of glasses 111. New elements of the face of the individual 120 appear when the individual 120 turns his head. The color information can also be distorted because of the refraction of the glasses, in particular in the case where the glasses are tinted or because of the shadows cast by the pair of glasses 111 on the face.
Thus, it is possible that for a given pixel of an established map such as for example the background map, or that of the face, the color information is not available because the user has not yet moved enough to bring up the necessary area.
Statistical learning models are used on the facial area, but are less effective on the background. It is thus possible to replace the pixels of the face area with techniques known as active appearance models or morphable 3D models (in English "3D morphable models").
In case the appearance prediction is not possible, the filling technique by spatial locality is used. The filling technique, close to the inpainting techniques well known to those skilled in the art, is based on texture synthesis, providing relevant knowledge for the reliable and real-time resolution of the filling problem. Since the topology of the models of glasses is known, and that the real-time constraint is important, the filling by patch is used, which guarantees the continuity of the colors between the painted areas and the respect of the structures of the textures. This technique allows us to quickly find similar elements in the image as well as the parallel processing of the majority of the pixels to be replaced.
The real-time filling technique is based on an inpainting technique well known to those skilled in the art. .
The filling of the areas to be treated is done by pixel or by patch, using a three-step algorithm:
1. calculation of the priorities of the patches,
2. propagation of texture and structure information,
3. updating trust values.
In the present nonlimiting example of the invention, the patch is formed by a square window centered around a pixel.
By knowing the topology of the various elements, such as the pair of glasses 111 and the face model, the filling of the zones is carried out in real time by bringing several advantages compared to the techniques commonly used by a person skilled in the art:
independence of the direction of travel;
possibility to work by arbitrary patch size (up to pixel); avoid the systematic and costly search for similar patches; and guaranteed color continuity during filling.
The calculation of the priority of the patches, well known to those skilled in the art, is followed for the areas for which no information is available, such as for example the pixels of the mask corresponding to the background area.
However for the pixels located on the face area, knowledge of the face topology makes it possible to define the directions and priorities of the route and the a priori sampling areas of the patches. For example, if the eyes are hidden, we know in advance parametrically the geometric model of construction of the eyes, and we can thus adapt the priority, the size of the patches and the direction of propagation according to the linked curves. to the particular topology of an eye.
In areas where there is no information on the underlying parametric structures, such as the background or the skin, it is the knowledge of the topology of the pair of glasses which makes it possible to predefine the directions of travel. of the propagation of the structures in the direction perpendicular to the skeleton of the shape or perpendicular to the contour.
It should be emphasized that the propagation of the structures is never far, regardless of the pose of the face, from the direction of propagation of the isophotes. Indeed, the pair of glasses, although it may have a thick frame, has projections in the image such that the different edges of the same sub-object are almost parallel.
By favoring the route specific to the topology of the pair of glasses, two aspects are improved. First, the patch search is reduced to the first patch found containing information in that direction. Naturally, at each iteration, the pixels replaced in the previous iteration are used, allowing continuity of the structure. The propagation direction is also predefined and is calculated only for patches judged to be very structured by a structural criterion. For example, the entropy of the patch in question, or a coefficient depending on the norm of the gradient directions, can be used. This approach avoids a systematic and costly ranking of priorities as well as directions of propagation.
In order to guarantee color continuity while preserving the structure, and to avoid the directional smoothing that can be observed in "onion peel" type routes, the following process is used:
Let the patch to be filled T0 centered around the pixel p0 at a distance from the front of the mask to be filled such that the patch contains pixels from a known area. We define a maximum confidence distance d _max which guarantees continuity of the structures, and we move in both directions from the direction normal to the contour n _c of the goggle mask to find the two full patches Tl and T2 (centered in pTl and pT2 ) in the closest “texture” areas. This technique saves the calculations for finding the closest colorimetric patch. We then do the color matching to fill the pixels of the patch T0, taking into account the distance dl from p0 to pTl and the distance d2 from p0 to pT2 in order to allow the following linear interpolation:
pi (u, v) _T0 dl (dl + d2) pi (u, v) _T1 d2 (dl + d2) pi (u, v) _T2 , if dl d _max and d2 <d _max where each pi (u, v ) _T corresponds to a pixel of a patch T. The notation ssi corresponds to the abbreviation of "if and only if >>".
In the other cases, we have:
pi (u, v) _T0 = pi (u, v) _T1 , if dl <d _max and d2> d _max pi (u, v) _T0 = pi (u, v) _T2 , if dl> d _max and d2 < d _max
This process is called several times until all the pixels in the mask are processed.
To avoid the effects of onion peeling, and not to reproduce the structural artifacts due to the compression of the image, for small patches (up to the pixel), we add a random local displacement for the areas of weak structure. Uniform or Gaussian noise can be used. This uniform noise is estimated as a function of the average noise of the surrounding areas known to the image by techniques well known to those skilled in the art. The entropy can be used to order the structures, if the latter is not already known thanks to the model. The replaced area can be the pixels to be replaced from the full patch or a smaller patch up to the pixel.
It should be noted that the patch sizes are dependent on the size of the structure to be replaced, namely the thickness of the pair of glasses, and the distance from the user to the camera.
Figure 6 illustrates the development of the masks from the image 220 of the user 120 (Figure 6a). As illustrated in FIG. 6b, the environment 1 ^ in the background of the pair of glasses 111 is broken down into several zones:
an area 470 corresponding to the face; and an area 471 corresponding to the background.
It should be emphasized that the area 470 can be subdivided into semantic subregions 472, corresponding for example to the hair region 472-I, to the skin region 472 ₂ .
During step 360, the generation method develops the mask of the pair of glasses 111 by geometric projection of the three-dimensional model of the pair of glasses 111 on a first layer.
It should be noted that the first layer is emptied before the geometric projection. Thus, the first layer includes an image of the three-dimensional model of the pair of glasses 111 according to the same angle of view and the same size as the real pair of glasses 111.
The mask TM _g of the pair of glasses 111 is divided into several parts:
the mask TM _gf of the frame 112b and of the branches 117; and the TM _gl mask of the glasses 113.
A mask ™ _a _ _e corresponding to the light effects worn on the face, in particular caustics and shadows, is created simultaneously. The TM _ge mask also includes the light effects on the glasses, in particular the reflections.
The TM _gf mask corresponds to the RGBA rendering image of the model of the pair of glasses 111 for the values of face parameters Pe _Ma and glasses Pe _Mg estimated at the instant corresponding to the image 220. The mask TM _gf takes account possible occultations of the pair of glasses 111 such as for example a hand placed in front of the face or a lock of hair falling on the face.
A TMB _gf binary mask is obtained by binarizing the rendering alpha layer of the TM _gf mask. The alpha layer representing the transparency of the pixels, the binarization of the alpha layer makes it possible to delimit the mask TM _gf .
FIG. 6c represents the environment Ibg represented in FIG. 6b on which the mask TMB _gf is added.
The masks TM _gl and TM _ge are determined using the same technique as for the mask TM _gf , considering respectively for each mask the glasses 113 and the light effects such as reflections on the glasses or the shadows cast on the face.
The mask TM _a corresponding to the face is created during step 365 from the model of the face including the eyes, according to the orientation and positioning parameters of the face previously estimated for the image 220.
It should be emphasized that the binary mask TMB _gf of the pair of glasses is contained in the region of the face TM _a or in the background map T _bg , as can be seen in FIG. 6c.
Thanks to the topological knowledge of the pair of glasses object, a sampling is carried out in a locality defined on either side of the TMB _gf mask, according to a parameterization given by the topology of the pair of glasses, in the direction of the normal to contour n _c .
Thus, the branch is sampled on each side on zones of maximum size representing a partition Ω _κ of the regions defined by TM _a or Tbg · In the present case, an adjustment of the division of the space is carried out with the border curves of the regions. Thanks to this division, it is possible to estimate the field of local colorimetric transformations between the appearance prediction and the current image, for the face regions TM _a and background Tbg deprived of the glasses region TM _g , which makes it possible to find transformations due to changes in overall light, or drop shadows
For the face, areas that do not respond to this colorimetric dynamic may not be considered at first, such as the eyebrows, the hair or the beard, in order to focus on the skin, which follows a pseudo-Lambertian dynamic and allows low and medium frequency color adaptation. These areas are identified and segmented using the points and curves found during the recognition of the characteristics and can be refined in the texture space. The transformation is then calculated on the zones of the same type, as in the color transfer or tone mapping techniques well known to those skilled in the art.
This field of colorimetric transformations is applied respectively to the images TM _a and Tbg to form the maps TM _a Wc and T _bg Wc. It should be emphasized that the colorimetric transformation is carried out on colorimetrically coherent sub-regions of the images TM _a and T _bg . These coherent sub-regions can advantageously be included in a semantic sub-region 472 in order to improve the final result. In addition, the colorimetric transformation takes into account the differences in dynamics between the sub-regions of these spaces.
These new images TM _a Wc and T _bg Wc are used to analyze the pixels of the current image I, the color of which is not determined by the prediction, in particular in the lens and face areas, in order to detect reflections and light modifications. and geometric of the TM _gh lenses as well as the drop shadows of the TM _ge frame in the associated regions. This technique makes it possible in particular to correct the deformations of the face due to the optical correction of the glasses of the pair of glasses 111 worn by the user 120.
We thus fill the TMB _gi and TMB _ge maps for each pixel x of the regions considered, according to the measurement
Vxen _B , x = f ¹ · l | ™ _a M / c (x) -; (x) | ² <e
(.0, otherwise with Ω _κ a colorimetrically consistent sub-region of the region Ω = {x; TMB _a (x) = 1} u {x; TMB _bg (x) = 1}. The threshold e is large enough to include aliasing colors and avoid compression and sensor image artifacts, and the mask can then be expanded based on confidence in 3D object knowledge and registration.
FIG. 6d represents the image represented in FIG. 6c on which is added the TMB _ge map representing the light effects, reflections and shadows.
The pixel map to replace TMB _g is the union of the TMB _gh TMB _ge maps, and TMB _gf , deprived of the pixels of the TB _fg blackout alpha map.
TMB _e = U TMB _ge . TMB _gf .} TB _fe
The occultation alpha map TB _fg represents the opaque pixels of the occultation map ^T f3 'θ', that is to say the pixels of ^T f3 whose alpha value is equal to 1.
FIG. 6e represents the image represented in FIG. 6d on which the occultation alpha map TB _fg is added.
The modification of the appearance of the mask representing the pair of glasses 111 is carried out during step 370.
From the image 220, and from all the masks created, the modification of the aspect replaces the pixels of the image 220 corresponding to the binary mask TMB _g by the adequate values which make it possible to make disappear or apply a treatment on the targeted parts of the pair of glasses 111 in image 220.
The colors chosen can be derived from the following techniques or from their combinations:
at. prediction colors corresponding to the geometric and colorimetric adjustment parameters;
b. colors learned statistically offline associated with a shape model;
vs. colors without prior knowledge which guarantee spatial coherence and color continuity, which can be combined with prior knowledge of form;
d. colors statistically learned during process 300.
In all cases, a color continuity constraint around the borders of the mask is implicitly or explicitly integrated.
The preferred technique in this example is color replacement by prediction, because it best manages the discontinuities of the model. Even if it can be sensitive to an estimation error, the addition of a dilation of the mask as well as a constraint of color continuity makes it possible to propose replacement results that are not detectable for the human eye. Thanks to the calculated maps TM _a Wc and Tbg Wc, and the map Tf _a > all the pixels can be replaced in the majority of cases.
The final image 210 is then generated during step 380 of the method 300 by flattening the different layers superimposed on the initial image 220, namely from the background:
initial image 210;
first layer comprising the mask of the pair of glasses 111; second layer including the occultation mask.
Thus the individual 120 wearing the pair of glasses 111 sees his image on the screen 130, as in a mirror, without the pair of glasses 111 kept on his face. He can then virtually try on the new pair of glasses
110 which is positioned on the face in place of the pair of glasses
111 real. The pair of virtual glasses 110 is positioned on the face 121 of the individual 120 thanks to an intermediate layer inserted between the first layer and the second layer. The intermediate layer includes a projection of a model of the virtual pair of glasses 110 realistically positioned on the face 121 of the individual 120.
For the technical details of the positioning of the pair of virtual glasses 110 or of the generation of the intermediate layer, the person skilled in the art can for example refer to request FR 10 50305 or request FR 15 51531 describing in detail techniques allowing the fitting of a pair of virtual glasses by an individual.
Another example of a particular embodiment of the invention
FIG. 7 represents a device 500 for trying on a virtual object 510 by an individual 520 wearing the pair of glasses 111 on the face.
The device 500 comprises a touch screen 530 fixed vertically on a support 531, a camera 532 centered above the screen 530, two peripheral cameras 533 and a processing unit 534.
The device 500 also includes a device 537 for measuring the distance of an element from the screen 530, comprising an infrared projector 535 projecting a pattern and an infrared camera 536.
The device 500 further comprises a modeling device 540 comprising a turntable 541 intended to receive a pair of glasses in its center, two digital cameras 542 fixed, oriented towards the center of the tray 541 and a plain bottom 543 intended to be behind -plan of the modeled pair of glasses. The modeling device 540 connected to the processing unit 534 can thus actuate the turntable 541 and acquire images of the pair of glasses 111 from different viewing angles.
In a variant of this particular embodiment of the invention, the plate of the modeling device 540 is fixed. The modeling device 540 then comprises two additional fixed digital cameras oriented towards the center of the plate. The position of the two additional cameras corresponds to the 90 degree rotation of the position of the two cameras 542 around the central axis normal to the plate.
It should be emphasized that the modeling device 540 performs a calibration by acquiring for each camera 542 an image of the solid background 543 alone.
The individual 520 removes the pair of glasses 111 which he wears on the face and places them, branches 117 open, in the center of the turntable 541. In order to properly position the pair of glasses 111, marks are provided on the plate 541 .
The first camera 542i oriented so that the optical axis of the camera 542-I acquires a front image of the pair of glasses 111, then after a 90 ° rotation of the turntable 541 me side image of the pair of glasses 111.
Simultaneously, the second camera 542 ₂ acquires images of the pair of glasses 111 for diving, 3/4 front and 3/4 rear. The position of the camera 542 ₂ is thus raised, at about 45 ° relative to the median plane of the plate 541.
A three-dimensional model of the pair of glasses 111 is created from the four images acquired from the pair of glasses 111 and two images of the background.
To this end, the pair of glasses 111 is segmented in each acquired image by making the difference between the background images and the images with the pair of glasses 111, which makes it possible to create binary masks of the different elements.
For modeling, the frame 112 of the pair of glasses 111 is considered to be an assembly of three 3D surfaces:
a surface representing the face 112b of the pair of glasses 111; and a surface for each branch 117 of the pair of glasses 111.
It should be emphasized that since the pair of glasses 111 is symmetrical, the two branches 117 are similar and only the opening angle between each branch 117 and the face 112b can vary. Only a three-dimensional model of a branch 117 is thus generated. The three-dimensional model of the other branch 117 is then developed from the model of the first branch 117 symmetrically with respect to the main median plane of the first branch 117.
In order to estimate the 3D surface, a distance map is calculated for each of the images from the masks extracted from the segmentation. The 3D surface parameters are estimated via minimization respecting the criteria of central symmetry and continuity of the frame of the pair of glasses 111.
An estimation of a 2D contour of the face 112b and of the branches 117 is carried out from the binary masks of the face 112b and of the branches 117.
The 2D contour is then projected onto the corresponding 3D surface. A thickness is added to each of the 2D contours projected onto the surfaces to obtain the three-dimensional models of the face 112b and of the branches 117 forming the three-dimensional model of the pair of glasses 111.
To this end, from the points of the 2D contours, a Delaunay triangulation is performed. This triangulation is used on the points of the 3D surface to create the model of the pair of glasses 111. The images acquired from the pair of glasses 111 are applied in texture to the model of the pair of glasses 111.
It should be emphasized that 3D statistical models of each element of the pair of glasses 111 can be used for the configuration and the mesh of the 3D surfaces from 2D contours.
An image of the individual 520 without glasses is acquired by the camera
532.
From the image of the individual 520 without glasses, a model M _a of an avatar representing the individual 520 is developed from the acquired images and measurements of the distance to the screen of the elements of the image. according to the method of developing the avatar _model M described above in step 355 of the first exemplary embodiment.
A flattened texture of the face of the individual 520 is extracted from the avatar model M _a .
Before obtaining a 2D mask of the pair of glasses 111, the pair of glasses 111 is followed in the sequence of images acquired by the camera 132 by a tracking method 600 of the pair of glasses 111.
The tracking method 600, illustrated in the form of a block diagram in FIG. 8, comprises a first initialization step 610.
The initialization step 610 makes it possible to position the model M _g of the pair of glasses 111 on the avatar M _a and to open the branches of the model M _g in the same way as the pair of real glasses 111 placed on the face of the individual 520.
For this, a first positioning of the model M _g is done in 3D on the avatar M _a so that the model of the pair of glasses 111 rests on the nose and ears of the avatar. The model M _g is thus positioned according to calculated installation parameters. The pose parameters include the orientation relative to the camera and the magnification to be applied to the model M _g in order to obtain the pair of glasses 111 displayed in the image.
The avatar is positioned and oriented according to the virtual camera having the same orientation and the same optical parameters as the camera 532. For this, the position and orientation of the face are determined on each image by means of a process of face tracking well known to those skilled in the art. Face tracking is based on the monitoring of characteristic points of the face. However, it should be noted that the characteristic points hidden in the image, in particular those found behind a pair of glasses or behind tinted lenses, are not taken into account in the tracking of the face.
A projection on a first layer superimposed on the initial image, of the model of the pair of glasses 111 positioned on the avatar makes it possible to obtain a mask of the pair of glasses 111.
In order to refine the position of the mask of the pair of glasses 111 on the first layer, the pose parameters are calculated by minimizing a cost function based on two components:
a component calculated as a function of the characteristic points of the face and of the eye system visible on the previous image in the sequence and as a function of prior images of the sequence; a component calculated as a function of the contours of the pair of glasses 111 in the image and of the model M _g of the pair of glasses 111 previously synthesized.
After initializing the model M _g of the pair of glasses 111, the tracking method 600 selects, during a second step 620, the set ω of the points of the model M _g whose normal is substantially perpendicular to the axis formed between the point and the virtual camera.
It should be emphasized that in the case where the face 112b of the pair of glasses 111 is substantially parallel to the plane of the camera 132, the branches
117 being hardly visible, only the face of the model M _g is taken into account in the tracking of the pair of glasses 111.
It should also be emphasized that in the case where the face is strongly turned, making the face 112b not very visible, only the branch of the model M _g is taken into account in the tracking of the pair of glasses 111.
During the third step 630, the tracking method 600 selects a subsample of n points from the set ω of points of the model M _g . The projection p2D _{m = ln} of the n points on the image have a substantially uniform and regular spacing. Thus, when the face 112b of the pair of glasses 111 is almost parallel to the image plane of the camera, the subsample comprises a small or even zero number of points of the branches.
The vectors n2D _{m = ln} corresponding to the projections of the normals of the n points of the set ω are calculated during the fourth step 640.
From the p2D projections and the n2D vectors, the method 600 performs for each index m, a search for the point p_grad _m of the image having the strongest gradient along the projection p2D _m from the normal to the point n2 £> _m .
The tracking method 600 then minimizes during the fifth step 650 the function of calculating the distance between the points p2D and p_grad. When the minimum value is reached, the position of the model M _g is considered to be representative of the actual position of the pair of glasses.
111.
A mask covering the pair of glasses 111 is created from the projection of the model M _g on the first layer.
The modification of the appearance of the mask of the pair of glasses 111 is carried out by replacing the color of the frame 112 of the pair of glasses 111 actually worn by the individual 520 with a new color.
The brightness is adjusted in order to make the modification of the color of the frame 112 realistic.
Thus, the individual 520 sees his image on the screen 530 with the same pair of glasses 111 but comprising a different color from the frame 112.
Another example of a particular embodiment of the invention
FIG. 9 represents an augmented reality device 800 used by an individual 820 wearing the pair of glasses 111 on the face. In this example, the pair of glasses 111 is fitted with corrective lenses adapted to the sight of the individual 820.
Individual 820 stands facing a camera 832 connected to a screen 830 displaying live image of the head 821 of individual 820 as in a mirror. The image displayed on the screen 830 shows the head of the individual 820 without the pair of glasses 111 on the face of the individual 820. The individual 820 can thus see himself clearly without his pair of glasses, as if he wore lenses.
In order to conceal in real time the pair of glasses 111 actually worn by the individual 820 on each image, from a given instant, of the sequence of images, also called video, displayed on the screen 830, a method of generating a final image from an initial image is used.
During this process, the pair of glasses 111 is detected and followed on each image of the sequence of images. A model of the pair of glasses 111 is generated and oriented in an identical manner to the pair of glasses 111 in order to create a mask by projection on a layer coming to be superimposed on the initial image.
The appearance of the mask covering the pair of glasses 111 is modified in order to erase on the screen the pair of glasses 111 worn on the face of the individual.
For this purpose, a planar map of the environment in the background of the pair of glasses 111 is created and updated dynamically by taking into account the information acquired with each frame of the video.
An inpainting method makes it possible to determine the color of each pixel of the mask of the pair of glasses 111 as a function of at least one pixel of the image near the pixel of the mask.
It should be emphasized that in the method used in the present example, the face is included in the environment of the pair of glasses 111 but is not detected for the preparation of the map representing the environment. Only the pair of glasses 111 is detected and followed.
In variants of this particular embodiment of the invention, the presence of the face of the individual 820 is detected but is not followed. A model of the face is thus generated and positioned in relation to the position of the pair of glasses followed in the image. The face model is used in projection for the development of the environment map. The face model can also be directly used by the inpainting method.
Individual 820 can try on a pair of virtual glasses or makeup and see themselves on the screen with it. It should be emphasized that in the case of the fitting of a virtual object, only the appearance of the visible part of the pair of glasses 111, that is to say not covered by the projection of the virtual object , can be advantageously modified, thus saving calculation time.
Another example of embodiment of the invention
FIG. 11 represents a screen 910 displaying a video 915 stored in a computer memory or a real-time video stream originating from a camera.
Video 915 shows the head of an individual 920 wearing the pair of glasses 111 on the face 921 before treatment.
FIG. 12 represents the screen 910 displaying the video 915 but in which the pair of glasses 111 is obscured on each image of the video by a method of generating a final image from an initial image according to the invention.
During this process, the face 921 is detected and followed on each frame of the video. On each image, the method adds a layer comprising an opaque mask covering the pair of glasses 111. It should be emphasized that the mask is dimensioned to cover most of the shapes and sizes of pairs of glasses. The mask is therefore not linked to the pair of glasses 111 which is not detected in the present example.
The method thus generates a layer for each image, on which the mask is oriented and sized in relation to the detected face.
For each layer, the generation process applies a texture to the mask from a model of the face previously established without the pair of glasses.
In order to make the final image realistic, the method includes a technique of "relighting" the texture of the mask, making it possible to adjust the colorimetry of the texture to the actual light illuminating the face 921.
In order to allow the analysis of light sources, techniques which are well known per se are used, such as stereophotometry or the so-called “shape from shading” technique, on the parts of the face such as the skin which follow a pseudo surface model. -Lambertian. The light sources and their parameters are then used as a source of synthesis for the "relighting" of the face.
Holes can be established on each mask at eye level of the face 921 in order to make them visible on each image.
It should be emphasized that so that, for reasons of realism, the holes are not made in the mask when the eyes are optically deformed by the glasses of the pair of glasses 111 or when the glasses are tinted.
In the case where the holes are not made on the masks, a layer comprising pairs of resynthesized eyes is added on top of the mask layer.
The orientation of the synthesized eyes can advantageously be established from the actual orientation of the eyes detected and followed by techniques well known to those skilled in the art.
Other advantages and optional features of the invention
In variant embodiments of the invention, a real object to be erased from the image may be a hat, a scarf, hair or any other element partially or totally covering a face. The process can also be applied to any other real object that one seeks to obscure on an image, such as for example a garment worn by an individual.
In variant embodiments of the invention, an object to be placed on the face of an individual to replace the pair of glasses worn on the face is makeup, jewelry or even clothing. An individual wearing a pair of glasses can thus virtually try on make-up or evening wear by removing the pair of glasses worn in the image, thus making it possible to simulate the wearing of contact lenses. It should be noted that in the case of trying on a garment worn on the body of the individual, such as a suit or an evening dress, a scan of the morphology of the body of the individual may be useful to obtain a realistic rendering of the wearing of clothing.
In alternative embodiments of the invention, an individual wearing a pair of glasses sees himself on the screen with the same pair of glasses but with a frame having a color, a texture and / or materials different from those of the frame of the pair of glasses actually worn.
In variant embodiments of the invention, an individual wearing a pair of glasses sees himself on the screen with the same pair of glasses but with lenses of a different tint from that of the lenses of the pair of glasses actually worn.
In alternative embodiments of the invention, an individual wearing a pair of glasses is seen on the screen with the same pair of glasses but with glasses having a treatment different from that of the glasses of the pair of glasses actually worn . The treatment corresponds to the addition or removal of one or a combination of treatments well known to opticians, such as an anti-reflection treatment or a thinning of the glasses.
In alternative embodiments of the invention, an individual wearing a pair of glasses is seen on the screen trying a new virtual pair of glasses where the areas of the glasses of the real pair of glasses included in the image in the picture. inside the circles of the virtual pair of glasses are preserved, thus making it possible to increase the realism of the pair of virtual glasses. In fact, by keeping part of the real glasses, the real reflections due to the environment are also preserved in the image. It should be noted that the color of the preserved part of the real lenses can be modified in order to obtain a virtual pair of glasses with tinted or non-tinted glasses, while retaining the real reflections on the glasses.
In variant embodiments of the invention, a virtual object is partially superimposed on the real object to be erased from the image and only the visible parts of the corresponding mask of the real object are modified.
In variant embodiments of the invention, the real object is partially erased from the image or mainly from the image.

权利要求:
Claims (46)
[1" id="c-fr-0001]
1. Method for generating a final image from an initial image comprising an object capable of being carried by an individual, characterized in that it comprises the following steps:
a) detection of the presence of said object in the initial image;
b) superimposition of a first layer on the initial image, the first layer comprising a mask covering at least partially the object on the initial image;
c) modification of the appearance of at least part of the mask.
[2" id="c-fr-0002]
2. A method of generating an image according to claim 1, characterized in that the modification of the appearance of the mask comprises a step of replacing the texture of part or all of the object on the image final.
[3" id="c-fr-0003]
3. A method of generating an image according to any one of claims 1 to 2, characterized in that the modification of the appearance of the mask comprises a step of determining the texture of part or all of the object, the texture reproducing the elements in the background of the object in order to obscure all or part of the object on the final image.
[4" id="c-fr-0004]
4. A method of generating an image according to any one of claims 1 to 3, characterized in that the mask also covers all or part of the shadow cast of the object.
[5" id="c-fr-0005]
5. Method for generating an image according to any one of claims 1 to 4, characterized in that it also comprises the following step:
d) superimposition of a second layer on the initial image above the first layer, the second layer comprising at least one element partially covering the mask.
[6" id="c-fr-0006]
6. A method of generating an image according to any one of claims 1 to 5, characterized in that it also comprises, before step b), the following steps:
determination of the orientation of the object with respect to a device for acquiring the initial image;
determination of a characteristic dimension of the object on the initial image.
[7" id="c-fr-0007]
7. A method of generating an image according to claim 6, characterized in that it also comprises, before step b), the following steps:
development of a three-dimensional model of the object; development of the mask by geometric projection of the three-dimensional model on the first layer, the model having the same orientation and the same characteristic dimension on the first layer as the object.
[8" id="c-fr-0008]
8. A method of generating an image according to claim 7, characterized in that the development of the model of the object is carried out from at least one image of the object alone.
[9" id="c-fr-0009]
9. A method of generating an image according to any one of claims 1 to 8, characterized in that the object is worn on the face of an individual.
[10" id="c-fr-0010]
10. A method of generating an image according to claim 9, characterized in that the development of the model of the object is carried out from at least one image of the object worn on the face of the individual.
[11" id="c-fr-0011]
11. A method of generating an image according to any one of claims 9 to 10, characterized in that the object comprises a frame extending on either side of the face, and at least one lens assembled to said mount.
[12" id="c-fr-0012]
12. A method of generating an image according to claim 11, characterized in that it also comprises a step of identifying the frame from among the frames previously modeled and stored in a database, the mask being produced from model of the identified frame.
[13" id="c-fr-0013]
13. A method of generating an image according to claim 12, characterized in that the identification of the frame is carried out by generating support curves which adjust to the contours of the frame.
[14" id="c-fr-0014]
14. A method of generating an image according to any one of claims 12 to 13, characterized in that the identification of the frame is based on at least one of the following criteria:
frame shape; frame color (s); frame texture (s); logo presented by the frame.
[15" id="c-fr-0015]
15. A method of generating an image according to any one of claims 1 to 14, characterized in that it also comprises a step of drawing up a representation of the environment of the object.
[16" id="c-fr-0016]
16. A method of generating an image according to claim 15, characterized in that the step of modifying the appearance of the mask comprises its following substeps:
geometric projection of the representation of the environment on an intermediate layer superimposed on the first layer;
determination of its new color of a pixel of the mask as a function of its color of at least one pixel of the intermediate layer near the pixel of the mask.
[17" id="c-fr-0017]
17. A method of generating an image according to any one of claims 15 and 16, characterized in that it also comprises a step of detecting the presence of a face in the environment and in that the representation of the environment includes a model of the detected face on which a texture of the face is applied.
[18" id="c-fr-0018]
18. A method of generating an image according to claim 17, characterized in that it also comprises a step of determining the orientation of the face relative to the acquisition device and in that the model of the face is disposed substantially according to the orientation previously established.
[19" id="c-fr-0019]
19. A method of generating an image according to any one of claims 17 to 18, characterized in that the mask covering at least partially the object worn on the face is produced from the geometric projection of the face model on the first layer.
[20" id="c-fr-0020]
20. Method for generating an image according to any one of claims 17 to 19, characterized in that it also comprises the following steps:
analysis of at least one light source illuminating the face of the individual;
colorimetric transformation of all or part of the face model.
[21" id="c-fr-0021]
21. A method of generating an image according to any one of claims 19 to 20, characterized in that the color of a pixel on the texture of the face is determined by means of an inpainting method from colors of a patch near the pixel.
[22" id="c-fr-0022]
22. Method for generating an image according to claim 21, characterized in that the position of the patch is situated substantially on the perpendicular and / or on the vertical with respect to said pixel.
[23" id="c-fr-0023]
23. A method of generating an image according to any one of claims 20 to 22, characterized in that Its color of a pixel on the texture of the face is determined by means of an inpainting method from the facial model, previously established and oriented, the facial model comprising a representation of the eyes.
[24" id="c-fr-0024]
24. A method of generating an image according to any one of claims 19 to 23, characterized in that it also comprises a step of identifying at least one eye area on the texture of the face, Its corresponding eye area at the position of an eye of the detected face.
[25" id="c-fr-0025]
25. A method of generating an image according to claim 24, characterized in that the filling of the ocular zone is carried out by knowing the topology of the eye of the face detected.
[26" id="c-fr-0026]
26. A method of generating an image according to any one of claims 15 and 16, characterized in that the development of the representation of the environment of the object worn on the face of an individual is carried out without detecting of face in the environment.
[27" id="c-fr-0027]
27. A method of generating an image according to any one of claims 15 to 26, characterized in that the development of the representation of the environment comprises a sub-step of correction of the optical deformation due to a transparent element placed between the environment and a device for acquiring the initial image.
[28" id="c-fr-0028]
28. A method of generating an image according to any one of claims 1 to 27, characterized in that it is applied to all or part of a sequence of images forming a video.
[29" id="c-fr-0029]
29. Method for generating an image according to claim 28, characterized in that Its representation of the environment and / or the model of the object are updated with each image of its sequence.
[30" id="c-fr-0030]
30. A method of generating an image according to any one of claims 15 to 29, characterized in that the representation of the environment and / or the model of the object is updated from a plurality of 'initial images taken from a plurality of distinct viewing angles.
[31" id="c-fr-0031]
31. A method of generating an image according to any one of claims 1 to 30, characterized in that the generation of the final image is carried out in real time from the initial image.
[32" id="c-fr-0032]
32. Augmented reality method intended for use by an individual wearing a vision device on the face, characterized in that it comprises the following steps:
real-time acquisition of a video of the individual wearing the vision device on the face;
real-time display of the video in which the appearance of the viewing device is totally or partially changed by the image generation method according to any one of claims 1 to 31.
[33" id="c-fr-0033]
33. Method of augmented reality according to claim 32, characterized in that the vision device is totally or partly obscured from the video displayed in real time.
[34" id="c-fr-0034]
34. Method of augmented reality according to any one of claims 32 to 33, characterized in that the vision device worn by the individual comprises corrective lenses adapted to the sight of the individual.
[35" id="c-fr-0035]
35. Augmented reality method according to any one of claims 32 to 34, characterized in that the individual wearing the vision device tries a virtual object superimposed at least partially in the video on the vision device partially or totally obscured .
[36" id="c-fr-0036]
36. augmented reality method according to any one of claims 32 to 35, characterized in that it comprises a step of initialization of the model of the face of the individual from at least one image of the individual does not not wearing the vision device on the face.
[37" id="c-fr-0037]
37. Augmented reality method according to any one of claims 32 to 35, characterized in that it comprises a step of initialization of the model of the face of the individual from a plurality of images of the individual wearing his vision device, his images corresponding to different angles of view of the face.
[38" id="c-fr-0038]
38. Augmented reality method according to any one of claims 32 to 37, characterized in that it comprises a step of initialization of the model of the vision device from at least one image of said device acquired in a vision device. dedicated modeling.
[39" id="c-fr-0039]
39. Augmented reality method according to any one of claims 32 to 37, characterized in that it comprises a step of initialization of the vision device model from at least one image of the individual wearing his device. of vision.
[40" id="c-fr-0040]
40. Augmented reality device allowing the fitting of a virtual object by an individual wearing a vision device, the virtual object at least partially covering the vision device, characterized in that it comprises:
at least one camera acquiring a video of the individual;
a processing unit of the acquired video, the processing unit at least partially obscuring on the majority or all of the images of the video the vision device by means of a method for generating an image according to the any of claims 3 to 31;
at least one screen displaying the processed video of the individual.
[41" id="c-fr-0041]
41. Augmented reality device according to claim 40, characterized in that the screen is vertical and in that the camera is fixed substantially in the plane of the screen.
[42" id="c-fr-0042]
42. Augmented reality device according to any one of claims 40 and 41, characterized in that it comprises two cameras spaced, parallel to an edge of the screen, from a distance between thirty and fifty centimeters.
[43" id="c-fr-0043]
43. Augmented reality device according to claim 42, characterized in that it further comprises a third camera sensitively on the median axis between the first two cameras.
5
[44" id="c-fr-0044]
44. Augmented reality device according to any one of the claims
40 to 43, characterized in that the screen is tactile.
[45" id="c-fr-0045]
45. Augmented reality device according to any one of claims 40 to 44, characterized in that the display of the acquired and modified video
10 is performed in real time.
[46" id="c-fr-0046]
46. Augmented reality device according to any one of claims 40 to 45, characterized in that it comprises a device for acquiring the three-dimensional model of the vision device.
1/8

类似技术:

公开号 | 公开日 | 专利标题

EP3479344B1|2020-12-09|Method for concealing an object in an image or a video and associated augmented reality method

EP2526510B2|2021-09-08|Augmented reality method applied to the integration of a pair of spectacles into an image of a face

EP3401879B1|2021-02-17|Method for modelling a three-dimensional object from two-dimensional images of the object taken from different angles

CN106575450B|2019-07-26|It is rendered by the augmented reality content of albedo model, system and method

Stengel et al.2015|An affordable solution for binocular eye tracking and calibration in head-mounted displays

Delaunoy et al.2014|Photometric bundle adjustment for dense multi-view 3d modeling

EP2455916A1|2012-05-23|Non-rigid tracking-based human-machine interface

EP2760329A1|2014-08-06|Method for determining ocular and optical measurements

FR3011952A1|2015-04-17|METHOD OF INTERACTION BY LOOK AND ASSOCIATED DEVICE

FR3053502A1|2018-01-05|SYSTEM AND METHOD FOR DIGITAL MAKE-UP MIRROR

WO2019020521A1|2019-01-31|Method for determining at least one parameter associated with an ophthalmic device

FR2950984A1|2011-04-08|METHOD AND EQUIPMENT OF MEASUREMENTS FOR CUSTOMIZATION AND MOUNTING OF CORRECTIVE OPHTHALMIC LENSES

WO2018002533A1|2018-01-04|Method for concealing an object in an image or a video and associated augmented reality method

US10685457B2|2020-06-16|Systems and methods for visualizing eyewear on a user

FR3067151B1|2019-07-26|METHOD FOR REALISTIC VIRTUAL TRYING OF A PAIR OF EYEGLASSES BY AN INDIVIDUAL

CN110446968A|2019-11-12|For determining the method implemented by computer of centering parameter

Świrski2015|Gaze estimation on glasses-based stereoscopic displays

FR3066304A1|2018-11-16|METHOD OF COMPOSING AN IMAGE OF AN IMMERSION USER IN A VIRTUAL SCENE, DEVICE, TERMINAL EQUIPMENT, VIRTUAL REALITY SYSTEM AND COMPUTER PROGRAM

FR3086161A1|2020-03-27|AUTOMATIC DETERMINATION OF THE PARAMETERS NECESSARY FOR THE PRODUCTION OF GLASSES.

EP2081144A1|2009-07-22|Computerised system to assist the marketing of clothing items, in particular a fitting room with augmented reality

FR2986893A1|2013-08-16|SYSTEM FOR CREATING THREE-DIMENSIONAL REPRESENTATIONS FROM REAL MODELS HAVING SIMILAR AND PREDETERMINED CHARACTERISTICS

CN110462625A|2019-11-15|Face recognition device

WO2002063568A1|2002-08-15|Method and system for generating virtual and co-ordinated movements by sequencing viewpoints

同族专利:

公开号 | 公开日

EP3479344A1|2019-05-08|

US20180005448A1|2018-01-04|

KR102342982B1|2021-12-24|

KR20190021390A|2019-03-05|

JP2019527410A|2019-09-26|

FR3053509B1|2019-08-16|

EP3479344B1|2020-12-09|

CN109983501A|2019-07-05|

US9892561B2|2018-02-13|

引用文献:

公开号 | 申请日 | 公开日 | 申请人 | 专利标题

WO2010042990A1|2008-10-16|2010-04-22|Seeing Machines Limited|Online marketing of facial products using real-time face tracking|

US20150055085A1|2013-08-22|2015-02-26|Bespoke, Inc.|Method and system to create products|

WO2016020921A1|2014-08-04|2016-02-11|Pebbles Ltd.|Method and system for reconstructing obstructed face portions for virtual reality environment|

WO2016050729A1|2014-09-30|2016-04-07|Thomson Licensing|Face inpainting using piece-wise affine warping and sparse coding|

EP1136869A1|2000-03-17|2001-09-26|Kabushiki Kaisha TOPCON|Eyeglass frame selecting system|

CN109288333B|2012-12-18|2021-11-30|艾斯适配有限公司|Apparatus, system and method for capturing and displaying appearance|

US20110071804A1|2007-02-21|2011-03-24|Yiling Xie|Method And The Associated Mechanism For 3-D Simulation Stored-Image Database-Driven Spectacle Frame Fitting Services Over Public Network|

US8553279B2|2007-07-04|2013-10-08|Samsung Electronics Co., Ltd|Image forming apparatus and a control method to improve image quality based on an edge pixel|

FR2955409B1|2010-01-18|2015-07-03|Fittingbox|METHOD FOR INTEGRATING A VIRTUAL OBJECT IN REAL TIME VIDEO OR PHOTOGRAPHS|

JP5648299B2|2010-03-16|2015-01-07|株式会社ニコン|Eyeglass sales system, lens company terminal, frame company terminal, eyeglass sales method, and eyeglass sales program|

US20130088490A1|2011-04-04|2013-04-11|Aaron Rasmussen|Method for eyewear fitting, recommendation, and customization using collision detection|

US20130262259A1|2012-04-02|2013-10-03|Yiling Xie|Method and System for Making Up Spectacles and Eyesight Testing through Public Network|

US9286715B2|2012-05-23|2016-03-15|Glasses.Com Inc.|Systems and methods for adjusting a virtual try-on|

US9378584B2|2012-05-23|2016-06-28|Glasses.Com Inc.|Systems and methods for rendering virtual try-on products|

US9817248B2|2014-12-23|2017-11-14|Multimedia Image Solution Limited|Method of virtually trying on eyeglasses|CA2901477A1|2015-08-25|2017-02-25|Evolution Optiks Limited|Vision correction system, method and graphical user interface for implementation on electronic devices having a graphical display|

WO2018053703A1|2016-09-21|2018-03-29|Intel Corporation|Estimating accurate face shape and texture from an image|

CN108513668B|2016-12-29|2020-09-08|华为技术有限公司|Picture processing method and device|

JP6855872B2|2017-03-24|2021-04-07|アイシン精機株式会社|Face recognition device|

US10777018B2|2017-05-17|2020-09-15|Bespoke, Inc.|Systems and methods for determining the scale of human anatomy from images|

CN107808120B|2017-09-30|2018-08-31|平安科技（深圳）有限公司|Glasses localization method, device and storage medium|

CN109697749A|2017-10-20|2019-04-30|虹软科技股份有限公司|A kind of method and apparatus for three-dimensional modeling|

KR20190101835A|2018-02-23|2019-09-02|삼성전자주식회사|Electronic device providing image including 3d avatar in which motion of face is reflected by using 3d avatar corresponding to face and method for operating thefeof|

US10977767B2|2018-11-28|2021-04-13|Adobe Inc.|Propagation of spot healing edits from one image to multiple images|

US10825260B2|2019-01-04|2020-11-03|Jand, Inc.|Virtual try-on systems and methods for spectacles|

CN109829867A|2019-02-12|2019-05-31|西南石油大学|It is a kind of to restrain sample block restorative procedure for the spherical shape for stablizing filling|

KR20200101630A|2019-02-20|2020-08-28|삼성전자주식회사|Method for controlling avatar display and electronic device thereof|

AT521699B1|2019-02-21|2020-04-15|Silhouette Int Schmied Ag|Method for determining the optical center of the glasses of a pair of glasses to be made for a person wearing glasses|

CN109919876B|2019-03-11|2020-09-01|四川川大智胜软件股份有限公司|Three-dimensional real face modeling method and three-dimensional real face photographing system|

WO2020214897A1|2019-04-18|2020-10-22|Beckman Coulter, Inc.|Securing data of objects in a laboratory environment|

WO2020225756A1|2019-05-06|2020-11-12|CareOS|Smart mirror system and methods of use thereof|

WO2021072162A1|2019-10-11|2021-04-15|Swimc Llc|Augmentation of digital images with simulated surface coatings|

WO2021122387A1|2019-12-19|2021-06-24|Essilor International|Apparatus, method, and computer-readable storage medium for expanding an image database for evaluation of eyewear compatibility|

EP3843043A1|2019-12-23|2021-06-30|Essilor International|Apparatus, method, and computer-readable storage medium for expanding an image database for evaluation of eyewear compatibility|

KR102222599B1|2020-05-19|2021-03-04|웅진씽크빅|System and method for supporting reading by linking additional content to book|

KR102196794B1|2020-05-19|2020-12-30|웅진씽크빅|System and method for supporting reading by linking additional content to book|

KR102177384B1|2020-05-19|2020-11-12|웅진씽크빅|System and method for supporting reading by linking additional content to book|

CN112258389B|2020-12-23|2021-11-02|北京沃东天骏信息技术有限公司|Virtual reloading method and related equipment|

法律状态:
2017-06-30| PLFP| Fee payment|Year of fee payment: 2 |

2018-01-05| PLSC| Publication of the preliminary search report|Effective date: 20180105 |

2018-06-29| PLFP| Fee payment|Year of fee payment: 3 |

2019-06-27| PLFP| Fee payment|Year of fee payment: 4 |

2020-06-30| PLFP| Fee payment|Year of fee payment: 5 |

2021-06-30| PLFP| Fee payment|Year of fee payment: 6 |

优先权:

申请号 | 申请日 | 专利标题

FR1656154A|FR3053509B1|2016-06-30|2016-06-30|METHOD FOR OCCULATING AN OBJECT IN AN IMAGE OR A VIDEO AND ASSOCIATED AUGMENTED REALITY METHOD|

FR1656154|2016-06-30|FR1656154A| FR3053509B1|2016-06-30|2016-06-30|METHOD FOR OCCULATING AN OBJECT IN AN IMAGE OR A VIDEO AND ASSOCIATED AUGMENTED REALITY METHOD|

US15/285,554| US9892561B2|2016-06-30|2016-10-05|Method of hiding an object in an image or video and associated augmented reality process|

EP17742822.4A| EP3479344B1|2016-06-30|2017-06-29|Method for concealing an object in an image or a video and associated augmented reality method|

KR1020197002130A| KR102342982B1|2016-06-30|2017-06-29|Methods and related augmented reality methods for concealing objects in images or videos|

CN201780053200.5A| CN109983501A|2016-06-30|2017-06-29|The method and relevant augmented reality method of a kind of hidden image or the object in video|

JP2018569126A| JP2019527410A|2016-06-30|2017-06-29|Method for hiding objects in images or videos and related augmented reality methods|

PCT/FR2017/051744| WO2018002533A1|2016-06-30|2017-06-29|Method for concealing an object in an image or a video and associated augmented reality method|

[返回顶部]